Evolutionary Algorithms for Finding Interpretable Patterns in Gene Expression Data
نویسندگان
چکیده
Microarray Technology allows us to measure the expression of thousands of genes simultaneously, and under specific conditions. Clustering is the main tool used to analyze gene expression data obtained from microarray experiments. By grouping together genes with the same behavior across samples, resultant clusters suggest new functions for some of the genes. Non-exclusive clustering algorithms are required, as a gene may have more than one biological function. Gene Shaving (Hastie et al. 2000) is a clustering algorithm which looks for coherent clusters with high variance across samples, allowing clusters to overlap. In this paper we present two Evolutionary Algorithm approaches, based on Genetics Algorithms (GA) and Estimation of Distribution Algorithms (EDA), whose aim is to find clusters of similar genes with large between-sample variance. We apply our methods GA-Shaving and EDA-Shaving to S. cerevisiae cell cycle dataset outperforming Gene-Shaving results in terms of quality and size of obtained clusters. Furthermore, we use GO Term Finder (Boyle et al. 2004) to evaluate the biological interpretation of the results. It computes the most statistically significant biological processes associated to every cluster by means of the annotations of the Gene Ontology (Gene Ontology Consortium 2004).
منابع مشابه
SECURING INTERPRETABILITY OF FUZZY MODELS FOR MODELING NONLINEAR MIMO SYSTEMS USING A HYBRID OF EVOLUTIONARY ALGORITHMS
In this study, a Multi-Objective Genetic Algorithm (MOGA) is utilized to extract interpretable and compact fuzzy rule bases for modeling nonlinear Multi-input Multi-output (MIMO) systems. In the process of non- linear system identi cation, structure selection, parameter estimation, model performance and model validation are important objectives. Furthermore, se- curing low-level and high-level ...
متن کاملFinding Similar Patterns in Microarray Data
In this paper we propose a clustering algorithm called sCluster for analysis of gene expression data based on pattern-similarity. The algorithm captures the tight clusters exhibiting strong similar expression patterns in Microarray data,and allows a high level of overlap among discovered clusters without completely grouping all genes like other algorithms. This reflects the biological fact that...
متن کاملبررسی نقش عوامل مؤثر بر فراوانی حوادث در لولههای اصلی آب رسانی با استفاده از الگوی رگرسیونی ترکیبی
A water distribution network is one of the important parts of infrastructure systems. The efficient management and proactive planning of capital investment of these assets are fundamental for efficient and effective service delivered by water companies. The direct economic costs (i.e. rehabilitation investment, repair costs, water loss, etc.) as well as indirect costs (i.e. service and traffic ...
متن کاملStudy of Evolutionary and Swarm Intelligent Techniques for Soccer Robot Path Planning
Finding an optimal path for a robot in a soccer field involves different parameters such as the positions of the robot, positions of the obstacles, etc. Due to simplicity and smoothness of Ferguson Spline, it has been employed for path planning between arbitrary points on the field in many research teams. In order to optimize the parameters of Ferguson Spline some evolutionary or intelligent al...
متن کاملOptimization of sediment rating curve coefficients using evolutionary algorithms and unsupervised artificial neural network
Sediment rating curve (SRC) is a conventional and a common regression model in estimating suspended sediment load (SSL) of flow discharge. However, in most cases the data log-transformation in SRC models causing a bias which underestimates SSL prediction. In this study, using the daily stream flow and suspended sediment load data from Shalman hydrometric station on Shalmanroud River, Guilan Pro...
متن کامل